Question 1

Written Memo: AI and Fundamental Investing

How will AI change fundamental investing? Two-page maximum covering workflow changes, adaptation, new risks, and product value.

Sources & References

Zeus Podcast: Smallwood on AI in Research

Databricks: Jefferies AI Rollout (250+ users)

CFA Institute: Outperformed by AI?

PineBridge: Alpha in the AI Age (2026)

Seeking Alpha: The Great Commoditization

AI Hallucination Statistics 2026 (1-in-277)

Harvard: AI Hallucination Framework

FinRobot: Open-Source AI Research Agent

How Will AI Change Fundamental Investing?

Written Memo — Primer Candidate Assessment

Charlie Henderson · May 2026

This memo was produced using the same multi-model, cross-referencing approach it argues analysts should adopt. The process is documented below as both transparency and proof of method.

Podcast Audio → Whisper Transcription → Thesis Extraction → Web Research (3x) → 4-Model Debate → Synthesis & Drafting

Step 1: Podcast Transcription & Thesis Extraction

Source: Zeus Capital podcast — "How AI is Transforming Equity Research Workflows" with Alistair Smallwood, Head of Applied AI at Primer (15 April 2026, 63 minutes)

Downloaded the MP3 from Transistor.fm, split into 7 chunks (each under the 25MB Whisper API limit), and transcribed all chunks via the Groq Whisper API (whisper-large-v3). Full transcript: 60,539 characters.

Key positions extracted from the podcast:

AI commoditizes information but not workflow or judgment — "there's no corpus of text that explains what to do when you see reverse factoring"
70% of stock trading is fundamental, 30% is behavioral overlay — the behavioral edge remains human
Short-term alpha gets harder (quarterly earnings become "damp squibs" as everyone has same info), but long-term thinking (2yr+ horizons) becomes more valuable
LLMs are "hard programmed to choose the middle of the bell curve" — they output consensus by design
The scaling thesis: juniors covering 50 companies instead of 20, learning rate becomes exponential. "Teams of 3 become teams of 10, where the majority are non-human"
Capital formation could shift away from multi-manager back to single manager / longer-term thinking
Agent quality compounds via memory — inherits the analyst's mental model of a company over time

Step 2: Parallel Web Research

Search 1: AI in equity research workflows

Jefferies AI rollout to 250+ users, CFA Institute "Outperformed by AI" analysis, PwC 2026 predictions. Key finding: work that took days now takes minutes, but generating novel insights remains human territory.

Search 2: AI hallucination and bias in research

Harvard/MIT research on fabrication rates (1-in-277 papers by early 2026, a 6x increase since 2023), 51% of AI-adopting organisations reporting negative consequences from inaccuracy.

Search 3: Alpha compression and information commoditization

PineBridge "Alpha in the AI Age", Seeking Alpha "The Great Commoditization", CFA qualitative data thesis. Key finding: quantitative edge diminishing; next frontier is qualitative analysis and AI application quality.

Step 3: Multi-Model Debate

Format: Structured debate across 4 independent models

Claude Sonnet 4.6, GPT-5.4, DeepSeek V4 Pro, and Gemini 2.5 Pro each independently argued the thesis vs counter-thesis, then Gemini synthesised the verdict. Total cost: ~$0.11.

Debate result: All four models unanimously rejected the "cover more stocks" counter-thesis. The split was between evolutionary adaptation (GPT, DeepSeek — "augment existing analysts") and revolutionary transformation (Sonnet, Gemini — "the role itself must change"). The synthesis sided with transformation. Key contribution from the debate: Gemini's framing of a valuable AI product as a "Contradiction Engine" rather than a summarisation engine, which I adopted directly.

Where AI Helped — and Where It Didn't

AI helped with: Research breadth (processing 63 minutes of audio, finding current statistics across three domains, running four parallel model perspectives on the same thesis). The multi-model debate surfaced two blind spots I incorporated: the primacy of proprietary data over proprietary models, and the organisational immune response to role transformation.

AI did not help with: The central thesis — that the analyst role must shift from processing to validation — is a judgment call informed by experience, not a model output. The models were split on how radical the transformation should be; the decision to take the stronger position was mine. The models also uniformly underestimated the organisational difficulty of this transition, which is where human experience of institutional change matters most.

This process is itself a demonstration of the method this memo advocates: no single AI output was trusted at face value. The final document is a human synthesis of multiple flawed AI outputs, cross-referenced against primary source material.

Central Argument The traditional equity analyst — trained to build models, read filings, and synthesise commentary — is now performing tasks that AI does faster and more accurately. These skills have not become worthless; they have become table stakes. The analyst role must shift from information processing to information validation. The next-generation analyst is not someone who reads a 10-K. They are someone who can identify when the AI's reading of a 10-K is wrong, incomplete, or structurally biased.

What parts of the fundamental investing workflow will change most?

Modelling becomes automated. Information asymmetry shifts from "who read the filing" to "who spotted the AI's error." Short-term alpha compresses; long-term thinking wins.

▼

Financial modelling becomes automated but requires adversarial oversight. An AI agent can build a three-statement model from SEC filings in minutes. Jefferies has rolled out AI research tools to 250+ analysts, turning days of work into minutes. But models are only as good as the assumptions embedded within them, and LLMs default to the statistical centre of their training distribution. The analyst's job is no longer to build the model — it is to stress-test the model's assumptions against what the AI cannot see: management credibility, competitive dynamics in flux, and non-linear strategic shifts that defy historical pattern-matching.

Information asymmetry shifts, not disappears. When every analyst had to manually read Note 150 in the Report and Accounts, spotting aging receivables or reverse factoring was a genuine edge — most competitors simply would not do the work. AI eliminates that asymmetry overnight. The new asymmetry is meta-cognitive: understanding what the AI knows, what it missed, and what it confidently hallucinated. Hallucination rates in AI-generated research have increased sixfold since 2023, with fabricated citations now appearing in 1-in-277 academic papers. Financial analysis is not immune to this failure mode.

Short-term alpha compresses. As AI democratises quarterly earnings analysis, the range of consensus outcomes narrows. Quarterly trading becomes a damp squib. The durable edge moves to two-year-plus horizons — regime change, management evolution, product pivots — where Bayesian reasoning about non-linear outcomes remains beyond current AI capability.

What parts will not change?

The behavioural overlay (~30% of how stocks trade) remains irreducibly human. Conviction and risk management cannot be outsourced to a model.

▼

The behavioural overlay remains irreducibly human. Roughly 30% of how a stock trades is driven by behavioural dynamics: what is priced in, what the market believes it knows, and where positioning creates fragility. Determining "what is known and what is not known" — the core skill of active management — becomes harder, not easier, when AI is involved. Previously, you were second-guessing other humans; now you must second-guess AI agents whose reasoning is opaque and whose outputs are correlated across firms.

Conviction and risk management cannot be outsourced. An AI can generate a thesis. It cannot feel the weight of capital at risk. Portfolio construction, position sizing under uncertainty, and the discipline to hold (or cut) through volatility remain fundamentally human functions. No amount of AI sophistication changes the fact that investment is a decision made under irreducible uncertainty.

The value of management interaction shifts but persists. Management meetings are no longer about listening and taking notes — AI does that better. The new value is in asking the one question the AI cannot formulate: "Our models show a divergence between your stated CapEx priorities and your recent engineering hires in a non-core division. Can you explain this strategic ambiguity?" The analyst uses AI's complete data synthesis to identify the unknown unknowns, then uses scarce human interaction to probe those specific gaps.

How should analysts, PMs, and research teams adapt?

The next-generation analyst is an "AI Output Analyst" — trained in the epistemology of machine-generated knowledge, not the mechanics of data extraction.

▼

The core thesis: analysts must become more developer than financial modeller. The traditional analyst skillset — Excel modelling, note-reading, management meeting attendance — is now table stakes that AI performs faster and more accurately. The scarce skill is no longer the ability to build a model; it is the ability to build the system that builds, verifies, and challenges the model. This is a fundamentally different competency, closer to software engineering than to traditional finance.

This requires four new competencies:

1. Agentic Workflow Design — the Analyst as Systems Architect
The ability to construct multi-step AI pipelines that cross-reference information, not just summarise it. For example: Agent 1 extracts supplier risk factors from peer 10-Ks; Agent 2 compares these against the target company's own disclosures; Agent 3 highlights discrepancies and formulates questions for management. This is systems design, not spreadsheet work. The analyst who can write a prompt chain that replicates a week of manual research in 10 minutes has an order-of-magnitude advantage over one who can build a slightly better Excel model.

2. Multi-Model Orchestration — Divergence as Signal
Running the same question through multiple model architectures (GPT, Claude, DeepSeek) and systematically identifying where they diverge. Each model is trained on different data corpora with different cutoff dates, biases, and failure modes. When they agree, confidence is high. When they diverge, that divergence is the most valuable signal in the analysis — it reveals where the training data is thin, the reasoning is ambiguous, or the question is genuinely hard. This is closer to forensic auditing than traditional research, and it requires understanding how models work, not just what they output.

3. Data Pipeline Construction — API Integration Over Manual Research
The next-generation analyst pulls balance sheet data from FMP, consensus from MarketScreener, filings from EDGAR, and news from Serper — not by visiting websites, but by building automated pipelines that feed verified data into their analysis. The analyst who can write a script to pull 3-year gross margins for a peer group in 30 seconds replaces the one who spends 3 hours copying numbers from annual reports. This is not optional technical literacy; it is the primary skill.

4. Bias and Structure Recognition
All information is structured with biases — both intentional (management framing, selective disclosure) and unintentional (training data gaps, model guardrails). The analyst must understand these biases at both the company reporting level and the AI model level, and synthesise a correct output from multiple flawed inputs. Understanding why GPT and Claude produce different answers to the same question is as important as understanding why a company's adjusted EBITDA differs from statutory.

The hiring implication: The next analyst hire should be evaluated on their ability to construct an agentic workflow, not on their ability to build a DCF in Excel. PMs should reduce coverage breadth and increase conviction depth. The firms that win will not be those covering 500 stocks with AI assistance; they will be those covering 50 stocks with AI-validated, multi-model, adversarially-tested theses — built by analysts who are as comfortable with an API as they are with an annual report.

What new risks does AI introduce?

Model monoculture, automation complacency, adversarial data poisoning, and hallucination propagation create an entirely new class of systemic risk.

▼

Model Monoculture If the top 20 hedge funds build on the same 2–3 foundation models, a single flaw or data poisoning event creates a correlated, systemic market risk with no historical precedent.

Automation Complacency The "human-in-the-loop" degrades to the "human-clicking-OK." Junior analysts trained on AI outputs from day one may never develop the pattern recognition that comes from manually working through accounts.

Adversarial Manipulation When AI agents automatically process press releases and filings, deliberately poisoning these data streams becomes a viable form of market manipulation. Subtle linguistic manipulation designed to exploit known LLM biases is an emerging attack vector.

Hallucination Propagation AI-generated "facts" enter the information ecosystem and get treated as verified data points by downstream systems. A fabricated data point in one model's output can propagate through agentic chains with no common audit trail.

What would make an AI research product genuinely valuable rather than just a faster summarisation tool?

A Contradiction Engine, not a summarisation engine. The scarce capability is not "what did the CEO say?" but "where is the narrative inconsistent with the data?"

▼

The market is flooded with tools that summarise earnings calls and extract financial data. These are commodities. A genuinely valuable product would:

Cross-reference management statements across quarters to identify shifted narratives and broken commitments — the linguistic equivalent of forensic accounting.

Construct adversarial "red team" analyses against a user's own thesis, systematically identifying the strongest counter-arguments and most likely failure modes.

Surface discrepancies between a company's disclosures and its peers' risk factors — if three of your four competitors flag supply chain risk and you don't, that silence is informative.

Maintain an auditable chain of reasoning that distinguishes sourced facts from imputed assumptions from model-generated estimates, so every number carries a confidence provenance.

The product that simply makes analysts faster is a commodity waiting to be competed away. The product that makes analysts more rigorous — that systematically reduces the probability of being confidently wrong — is the one that earns its seat in a professional workflow.

Question 2

AI-Generated Research Report Comparison

Pets at Home Group plc (PETS.L) — FY26 Pre-Close. Four reports cross-referenced against verified financial data from FMP API, Investegate, and analyst coverage.

Reports Reviewed

Report A — Production baseline (Score: 0/5)

Report B — Testing/prod (Score: 0/5)

Report C — Testing/high (Score: +1/5)

Report D — Testing/xhigh (Score: +1/5)

Verification Sources

Investegate: FY26 Prelim Results

FMP API: Income Statement

FMP API: Balance Sheet

FMP API: Cash Flow

MarketScreener: Analyst Consensus

TipRanks: Price Targets

Pets at Home: IR Page

Zeus Podcast: Smallwood Interview

Data Sources Used

Financial Modeling Prep (FMP) API — Income statement, balance sheet, cash flow, ratios, and growth metrics for FY24-FY26. Provided verified revenue (£1,469.6m), statutory PBT (£86.5m), gross margin (45.7%), net debt (£357m inc. leases), FCF (£147m), and the critical goodwill figure (£960m).

Investegate / Company RNS — FY26 Preliminary Results announcement (27 May 2026). Underlying PBT £92.8m, DPS 7.4p (-43.1%), underlying EPS 14.8p, consumer revenue by segment.

Analyst coverage data — MarketScreener, TipRanks, web research. 11 analysts covering: Jefferies (Buy, 265p), Canaccord (Buy, 245p), Berenberg (Hold), Peel Hunt (Hold). Consensus avg target: 222p vs current 192p.

Multi-model debate — Claude Sonnet, GPT-5.4, DeepSeek V4 Pro, Gemini 2.5 Pro independently evaluated all four reports against verified data. Unanimous verdict on best/worst. Cost: ~$0.11.

DeepSeek V4 (open-source replication) — Ran the same analysis through DeepSeek V4 (Apache 2.0, open-source) via API, feeding it the multi-source verified data. The output was compared head-to-head against all four Primer reports to test defensibility. Cost: $0.003. Time: 8 seconds.

Verification Method

Each claim in each report was cross-referenced against at least two independent data sources. Discrepancies were flagged with the magnitude of error and the likely cause. A competing report was generated using open-source tools to test whether the output quality is a function of the model or the data pipeline. The goal was not just to identify which report is "best" but to demonstrate the type of verification work that distinguishes genuine analysis from polished AI output.

Summary Verdict Report D is the strongest — it is the only report that demonstrates genuine analytical thinking beyond restating company disclosures. Reports A and B are effectively identical and provide zero incremental value over a raw data feed. However, all four reports share critical blind spots: none address gross margin compression, balance sheet quality, cash flow generation, or valuation context. The difference between the reports is the difference between "bad" and "less bad" — none would satisfy a competent buy-side analyst without significant supplementary work.

Best Report

Report D — most analytical depth, identifies disclosure gaps

Weakest Report

Report B — identical to A with zero added value

Most Useful to Analyst

Report D — but only with manual supplementation

Best Investment Insight

Report C — "stabilisation, not recovery proof"

Which report is best, and why? Which report is weakest, and why?

Report D is best (original analysis, identifies gaps). Report B is worst (carbon copy of A with no differentiation). The gap between C and D is narrow; the gap between A/B and C/D is a chasm.

▼

1st — Report D

Most Comprehensive

Profit mix table with FY25 comparison. Quantified central + insurance drag (£21m deterioration). Includes original guidance context (£115-125m → £92m). Observation that the company issued a "deliberately narrow statement" shows critical thinking about disclosure strategy. Best at identifying what data is missing and why that matters.

2nd — Report C

Best Editorial Judgment

Adds FY27-28 estimates (£100m/£115m) and bear/base/bull scenarios. Margin sensitivity analysis (50bp = £6-7m PBT) is useful original work. "Stabilisation rather than recovery proof" is the single best editorial line across all four reports. DPS estimate of -25-30% was materially wrong (actual: -43.1%).

3rd — Report A

Data Regurgitation

Restates company disclosures without original analysis. Notes the retail consensus miss (£51.1m vs £30m actual) but attributes it to "definition mismatches" without investigation. No forward estimates, no scenarios, no balance sheet analysis, no valuation context. A raw data feed with formatting.

4th — Report B

Zero Added Value

Nearly identical to Report A in structure, data, and conclusions. Same "definition mismatch" hand-wave. If two AI-generated reports are indistinguishable, one of them should not exist. Report B adds nothing that Report A does not already provide.

What factual, numerical, or reasoning issues do you see?

Cross-referenced every claim against FMP API data, company RNS, and analyst coverage. Key errors: all reports underestimate the DPS cut, none address gross margin compression (120bp), and the "definition mismatch" excuse avoids a critical question.

▼

Metric	Report Claims	Verified Actual	Verdict
Underlying PBT	All: "c£92m"	£92.8m	Accurate (within rounding)
Vet Group PBT	All: "c£83m"	~£83m	Accurate
FY25 Retail PBT	All: "£72.9m"	£72.9m	Accurate
Net Debt	All: "c£20m"	£357m (inc. leases) ~£20m (ex-leases)	Correct per company definition, but no report flags the £357m total debt or the definitional difference
DPS Change	C: "-25-30%" D: "-25-35%"	7.4p (-43.1%)	Arithmetic failure. All reports cite the 50% payout rebase (announced at pre-close) but then estimate the DPS cut as a range instead of calculating it. 50% × ~14.5p estimated EPS = ~7.25p, from 13p = -44%. The answer was derivable from the company's own stated policy. This is not an estimation error — it is a failure to perform the calculation.
Revenue	All: "Not disclosed"	£1,469.6m (-0.8%)	Correct that pre-close omitted revenue, but no report flagged this as an analytical risk
Gross Margin	None mention	45.7% (from 46.9%)	120bp compression entirely missed. £17.6m gross profit impact not discussed.
FCF	None mention	£147m	Strong cash generation ignored. FCF yield of 17.8% is decision-relevant.
Goodwill / Tangible Equity	None mention	£960m / -£8.6m	Total intangible assets (goodwill £960m + other £22m = £982m) exceed total equity (£973m). Tangible book value is negative. Major balance sheet risk entirely unaddressed.
Valuation	None mention	EV/EBITDA 6.05x	No valuation context whatsoever. Current price 192p vs consensus target 222p (15% upside).
Retail consensus miss	A/B: "£51.1m vs £30m — definition mismatch"	£30m actual	The £21m gap deserves investigation, not a hand-wave. See analysis below.

The "Definition Mismatch" Problem
Reports A and B note that retail consensus was £51.1m versus actual £30m — a 41% miss — but dismiss this as a "definition mismatch (segment PBT vs operating income)." This is the analytical equivalent of shrugging. A competent report would: (1) attempt to reconcile the definitions, (2) flag the magnitude as material regardless of definition, (3) investigate whether the miss reflects genuine operational deterioration that consensus hadn't priced in, and (4) frame it as a key question for management. Instead, the reports treat a £21m miss as a data quality footnote rather than a fundamental analytical finding.

The Negative Tangible Equity Risk
The most significant omission across all reports. Pets at Home carries £960m of goodwill against £973m of total equity, leaving tangible equity at negative £8.6m. This means the company's entire book value rests on the assumption that its acquired businesses (primarily veterinary practices) are worth what was paid for them. With underlying PBT down 30% and retail profits down 59%, the question of whether a goodwill impairment review is warranted is not academic — it is the single most important balance sheet question an analyst should be asking. No report raises it.

Which report would be most useful to an equity analyst?

Report D, but only as a starting point. It would save ~20 minutes of initial data gathering but would require 2+ hours of supplementary work on balance sheet, cash flow, valuation, and margin analysis before a management meeting.

▼

For a buy-side analyst preparing for a management meeting, Report D provides the best foundation because it identifies the right questions: why was disclosure so narrow? What is the insurance drag trajectory? Why does the segment profit bridge not add up cleanly? These are productive starting points for management engagement.

Report C's editorial judgment is sharper. The phrase "stabilisation rather than recovery proof" is the kind of conclusion an analyst needs — it frames the investment debate correctly and tells you what to watch for in the prelims. Report C also provides the only margin sensitivity analysis (50bp retail margin = £6-7m group PBT impact), which is directly useful for scenario modelling.

What an analyst would still need to do manually:

Balance sheet review — Pull the balance sheet from Companies House or FMP. Discover the £960m goodwill, negative tangible equity, and £397m total debt. None of the reports do this.

Cash flow analysis — FCF of £147m vs £92.8m underlying PBT implies strong cash conversion (OpCF/PBT = 2.05x). This is a positive signal that partially offsets the earnings decline. Not mentioned in any report.

Valuation context — At 192p, EV/EBITDA of 6.05x with 17.8% FCF yield. Compare to Frasers Group (5.2x), CVS Group (12.4x), or B&M European (7.1x). No report provides peer comparison.

Analyst consensus cross-check — 11 analysts cover PETS.L. Split: 5 Buy, 3 Hold, 3 Sell. Average target 222p (15% upside). Jefferies at 265p (Buy), Peel Hunt at Hold. This consensus data is essential context for any investment discussion.

What important context is missing?

Eight critical items that no report addresses. The missing balance sheet analysis alone would change the risk assessment of this stock.

▼

#	Missing Item	Why It Matters	Verified Data
1	Gross margin compression	120bp decline signals pricing/cost pressure that directly impacts the recovery thesis	45.7% (from 46.9%)
2	Free cash flow	£147m FCF against £92.8m PBT shows strong cash conversion — a bullish signal hidden by the earnings decline	£147m (17.8% yield)
3	Goodwill / tangible equity	£960m goodwill = 98.7% of equity. Tangible equity is NEGATIVE. Impairment risk is material with declining profits	-£8.6m tangible equity
4	Total debt (inc. leases)	Reports use company's ~£20m ex-lease figure without flagging the £397m total. Net debt/EBITDA of 1.83x is moderate but worth discussing	£397m total debt
5	Valuation multiples	No EV/EBITDA, P/E, or FCF yield. Without valuation, a research report is just a news summary	6.05x EV/EBITDA
6	Analyst consensus & coverage	11 analysts, split verdict (5 Buy / 3 Hold / 3 Sell), avg target 222p. Essential for positioning a view	222p avg (15% upside)
7	Share count reduction	463.5m → 454.4m shares via buybacks. Affects EPS calculation and per-share metrics	-2.0% dilution offset
8	Historical earnings trajectory	FY24 PBT £105.7m → FY25 £120.6m → FY26 £86.5m (statutory). The FY25 peak and FY26 collapse tells a story none of the reports contextualise	3-year statutory PBT trend

Which claims feel unsupported, generic, or overconfident?

The reports exhibit a pattern of accepting management framing uncritically, rationalising misses rather than investigating them, and making forward estimates without sufficient basis.

▼

All reports: "Net debt c£20m"
Uncritically adopts the company's ex-lease definition without mentioning the £397m total debt, £357m net debt (inc. leases), or the 1.83x net debt/EBITDA ratio. This is not factually wrong, but presenting only the favourable definition without the alternative is a failure of analytical balance.

Reports A/B: "Definition mismatch" on retail consensus
A £21m miss (41% below consensus) is explained away as a definitional issue. This is the most dangerous type of AI output: a plausible-sounding rationalisation that prevents the analyst from asking the right question. The right response is to flag the magnitude, investigate the cause, and prepare management questions — not to explain it away.

Report C: DPS impact of "-25-30%"
Actual DPS cut was -43.1% (7.4p from 13.0p). The estimate was materially wrong. More importantly, the reasoning was not shown: a good report would derive DPS from a stated payout ratio (50%) against forecast EPS, not estimate it as a percentage range. The actual 7.4p DPS against 14.8p underlying EPS implies a 50% payout — exactly what management signalled. The AI should have calculated this.

All reports: CMA as "benign" / "no adverse impact"
All four reports accept the company's characterisation of the CMA outcome without independent analysis. The CMA's veterinary market investigation is a material regulatory event, and simply restating management's framing of it as positive is not analysis — it is PR.

Report C/D: FY28 PBT estimates (£110-115m)
Two-year-forward estimates made without revenue data, margin trends, or competitive context. These numbers are presented with implied precision that the available data does not support. The range should be explicitly wider, or the estimate should be presented with clearly stated assumptions and confidence intervals.

Which report best identifies what actually matters from an investment perspective?

Report C frames the investment debate most clearly. Report D provides the best analytical foundation. Neither addresses the three things that would actually drive a buy/sell decision: valuation, balance sheet quality, and cash flow sustainability.

▼

What actually matters for an investment decision on PETS.L:

1. Is the retail recovery real or cosmetic? Report C's framing — "stabilisation rather than recovery proof" — is exactly right. Full-year retail PBT at £30m is 59% below FY25. H2 improvement is encouraging but H2 retail PBT of ~£26.5m is still roughly half of FY25 H2. The prelims need to show revenue trajectory, LFL breakdown, and margin progression. None of the reports push hard enough on this.

2. Is the valuation pricing in the downside? At 192p, EV/EBITDA of 6.05x and FCF yield of 17.8% suggest the market has priced in significant pessimism. If FY27 PBT reaches £99m consensus and the share count continues to shrink via buybacks, EPS accretion could be meaningful. None of the reports provide this context.

3. Is the balance sheet sound? £960m of goodwill on a company generating £92.8m of underlying PBT raises the question of whether the acquired vet practices are delivering adequate returns on the capital deployed to acquire them. With tangible equity negative, any impairment would directly reduce book value and potentially trigger covenant concerns. This is the question no report asks.

4. What is the capital allocation framework? Management rebased the dividend to 50% payout, launched a £50m buyback, and has £147m of FCF. This suggests confidence in cash generation despite the profit decline. The reports note the dividend rebase but don't connect it to the FCF story or the buyback implications for per-share value creation.

Bottom line: Report C best identifies the qualitative investment debate. Report D best identifies the analytical gaps. Neither comes close to what a competent human analyst would produce, because both are fundamentally constrained by what the company chose to disclose, rather than investigating what it chose not to.

Multi-Model Debate: Cross-Validated Verdict

Four AI models independently evaluated the reports against verified data. Unanimous on Report D as best, A/B as worst. Key insight from Gemini: the reports demonstrate a "failure of cross-statement analysis" — none connect P&L, balance sheet, and cash flow.

▼

Model	Best	Worst	Key Critique
GPT-5.4	D	A	"Report D provides comprehensive context; A fails to offer substantive analysis beyond restating data"
Claude Sonnet	D	A	"Negative tangible equity is the smoking gun none addressed — PETS is trading on £960m of acquisition goodwill with declining profitability"
DeepSeek V4	D	B	"B is plagiarised trash — pure copy of A with zero added value. Goodwill is 99% of equity — technically insolvent if impaired."
Gemini 2.5 Pro	D	A/B	"Even the best report operates at a superficial level. The AI failed a simple capital allocation derivation: given PBT, FCF, net debt, and a buyback, management would prioritise buyback over dividend."

Gemini's architectural critique
"None of the reports connected the three financial statements. The massive £960m in Goodwill directly leads to the negative tangible equity and presents a major impairment risk. A competent system must link P&L, balance sheet, and cash flow — this is a failure of cross-statement analysis, not just a missing data point."

The debate revealed that the gap between the reports is less interesting than their shared failures. The AI models generating these reports appear unable to perform the foundational task of equity analysis: connecting financial statements to each other and deriving implications that are not explicitly stated in the source text.

Our Report: Multi-Source Analysis vs Single-Document Extraction

We generated a competing report using DeepSeek V4 (open-source) fed with verified data from 4 sources. Below is the full report with source annotations, followed by a head-to-head comparison against Reports A-D.

▼

Model: DeepSeek V4 (Apache 2.0, open-source) | Cost: $0.003 | Time: 8 seconds | Data sources: FMP API, Investegate RNS, MarketScreener, TipRanks

Pets at Home Group plc (PETS.L) — Earnings Briefing

FY26 (52 weeks to 26 March 2026) | Generated from multi-source verified data

Headline: A Year of Two Halves, but Structural Questions Remain. Pets at Home delivered a weak FY26, with underlying PBT of £92.8m falling 30% YoY and well below the original guidance range of £115-125m. The headline narrative of "H2 recovery" is supported by improving retail volumes, but the damage was done in H1, and the full-year numbers reveal material margin erosion. The dividend rebase (-43%) and £50m buyback signal management's intent to return capital, but the balance sheet carries a significant goodwill overhang that cannot be ignored.
[Sources: Investegate RNS, FMP Income Statement]

Retail (PBT: ~£30m, -59% YoY). The primary drag. Gross margin compression of 120bps (to 45.7%) reflects promotional intensity, mix shift, and cost inflation not passed through. H1 retail PBT of just £3.5m was catastrophic; H2 implied ~£26.5m shows recovery, but full-year profitability remains well below FY25's £72.9m. The division now generates a PBT margin of roughly 2%.
[Sources: FMP Ratios (gross margin), Reports A-D (segment split)]

Vet Group (PBT: ~£83m, +9% YoY). The standout performer. Consumer revenue growth of 5% and PBT expansion of 9% demonstrate pricing power. The Vet Group now accounts for nearly 90% of group underlying PBT. This concentration risk is underappreciated: if vet growth slows, the group's earnings base is vulnerable.
[Source: Investegate RNS (segment revenue)]

Gross Margin: The 120bp Wound. The most concerning data point. Gross margin declined from 46.9% to 45.7% — a 120bp compression representing ~£17.6m of gross profit impact. This is structural, not one-off. If retail margins do not recover in FY27, operating leverage is severely impaired.
[Source: FMP Income Statement (FY25 GP £694.7m/46.9% vs FY26 GP £672.0m/45.7%)]

Balance Sheet: The Elephant in the Room. Tangible equity is negative £8.6m. The group's entire equity base of £973m is supported by £960m of goodwill from historical acquisitions (primarily veterinary practices). Any impairment — even a modest 10% write-down of £96m — could trigger covenant issues. Net debt/EBITDA of 1.83x is manageable, but the goodwill-to-equity ratio of 98.7% is extreme.
[Source: FMP Balance Sheet (Goodwill £960m, Equity £973m, Total Debt £397m)]

Cash Flow: The Silver Lining. Operating cash flow of £190m and FCF of £147m (17.8% yield) are strong. OpCF/PBT ratio of 2.05x demonstrates robust cash conversion despite declining earnings. However, FCF strength is partly a function of low CapEx (£43m, 2.9% of revenue) — not underlying earnings growth.
[Source: FMP Cash Flow (OpCF £190m, CapEx £43m, FCF £147m)]

Valuation: Cheap for a Reason. At 192p: EV/EBITDA 6.05x, P/E ~13x (underlying), FCF yield 17.8%. The market has priced in further erosion. Analyst consensus: 11 covering, 5 Buy / 3 Hold / 3 Sell. Average target 222p (15% upside). Jefferies at 265p (Buy), Peel Hunt at Hold.
[Sources: FMP Key Metrics, MarketScreener, TipRanks]

FY27 Scenarios (Group Underlying PBT):
Bear (£80m): Retail margins fail to recover, Vet slows to 3%. Stock to 150p.
Base (£98m): Consensus. Retail stabilises, Vet grows 5%. Stock to 200p.
Bull (£110m): Retail margins recover 50bps, Vet accelerates. Stock to 250p.

Conclusion: Cautious, Not Conviction. Cash-generative with a strong vet franchise, but retail is structurally challenged, the balance sheet carries significant goodwill risk, and margin compression is not yet arrested. Neutral — pending greater visibility on retail margins and goodwill impairment risk.

Head-to-Head: Our Report vs Primer Reports

Dimension	Primer Reports (A-D)	Our Report	Source
Gross margin	Not mentioned in any report	120bp compression identified as "most concerning data point"	FMP
Goodwill / tangible equity	Not mentioned in any report	£960m goodwill, -£8.6m tangible equity flagged as impairment risk	FMP
Cash flow analysis	Not mentioned in any report	FCF £147m (17.8% yield), OpCF/PBT 2.05x identified as positive signal	FMP
Valuation context	No multiples, no peer comparison, no analyst targets	EV/EBITDA 6.05x, P/E ~13x, FCF yield 17.8%, 11 analysts, avg target 222p	MarketScreener
Investment conclusion	No position taken in any report	"Neutral — cheap can get cheaper." Clear stance with reasoning	Editorial judgment
Vet concentration risk	"Earnings anchor" (positive framing only)	"~90% of group PBT. Concentration risk is underappreciated"	Derived from segment data
DPS forecast	C: -25-30%, D: -25-35% (actual: -43.1%)	Correctly states -43% actual cut, links to 50% payout policy	Investegate
Underlying PBT	All: "c£92m" (accurate)	£92.8m (precise)	Both accurate
H2 recovery narrative	C: "stabilisation not recovery" (good)	"H2 recovery supported but damage done in H1" (similar)	Both adequate

Why our report is better — and why it's not about the model
The improvement comes entirely from the data pipeline, not model quality. Primer's reports appear to analyse only the pre-close statement text. Our report was fed verified data from FMP (balance sheet, cash flow, ratios), Investegate (prelim results), and analyst consensus sources. The same DeepSeek model given only the pre-close text would produce output comparable to Reports A-D. The lesson: multi-source data ingestion is a more defensible moat than single-document extraction accuracy.

Replicability: What Open-Source Can Do Today — and Where Primer's Real Moat Should Be

Report generation is replicable with open-source models for ~$0.003. The improvement comes from multi-source data pipelines, not model quality. Primer's genuine moat is workflow encoding and agent memory — but these reports don't demonstrate it.

▼

The defensibility question. To test whether Primer's report output is replicable, I ran the same analysis through DeepSeek V4 (open-source, Apache 2.0 licence) via API. The prompt included verified data from the FMP financial API, the company's RNS announcement, and analyst coverage data — the same multi-source approach an analyst would take. Cost: $0.003. Time: 8 seconds.

The result: the open-source report outperformed all four Primer reports on every dimension that matters to an analyst.

Dimension	Primer Reports (A-D)	DeepSeek V4 (Open Source)
Gross margin analysis	None mention the 120bp compression	Identifies it as "the most concerning data point" and links to structural pressures
Balance sheet / goodwill	None address £960m goodwill or negative tangible equity	Calls it "the elephant in the room" — flags impairment risk and covenant exposure
Cash flow	None mention £147m FCF	Analyses FCF yield (17.8%), notes it's driven by low CapEx not earnings growth
Valuation context	No EV/EBITDA, P/E, or peer comparison	Provides EV/EBITDA (6.05x), P/E (~13x), FCF yield, and analyst target context
DPS forecast	C: -25-30%, D: -25-35% (actual: -43.1%)	States actual cut correctly (-43%) and links to 50% payout policy
Data extraction accuracy	Core numbers accurate (c£92m vs £92.8m actual)	Uses verified actuals directly from multiple sources
Investment conclusion	No position taken	Takes a clear stance: "Neutral — cheap can get cheaper"
Vet concentration risk	Notes vet is "earnings anchor" but doesn't flag risk	"Vet now accounts for ~90% of group PBT. This concentration risk is underappreciated."

Why the Open-Source Report Is Better

The improvement does not come from a better model. It comes from a better data pipeline. The Primer reports appear to analyse only the pre-close statement in isolation. The open-source report was fed data from four sources:

FMP API — 3-year income statement, balance sheet (revealing the £960m goodwill), cash flow (revealing the £147m FCF), ratios (revealing the 120bp margin compression), and growth metrics.

Company RNS (Investegate) — Full preliminary results with actual DPS (7.4p), revenue by segment, and underlying EPS.

Analyst coverage (MarketScreener, TipRanks) — 11-analyst consensus, individual broker ratings and targets, sentiment split.

Financial growth metrics — 3-year revenue, gross profit, and net income growth trends for contextualising the cycle.

This is the critical insight: 100% retrieval accuracy from a single source document is a solved problem. Smallwood himself acknowledged this in the Zeus podcast: "pulling numbers correctly doesn't make them a great analyst." The Primer reports prove this — they extract accurately from the pre-close statement, but they do not cross-reference against the balance sheet, cash flow statement, or external data sources. The result is reports that are precisely accurate about what the company chose to disclose, and entirely silent about what it didn't.

Open-Source Tools That Could Replicate This

Tool	What It Does	Licence
FinRobot	8 specialised agents, multi-page equity research with DCF, 15+ chart types, 3-year projections	MIT (Open Source)
DeepSeek V4 Pro	1.6T parameter model, strong financial reasoning, long-context (128K), agentic workflow capable	Apache 2.0
LlamaIndex + LlamaExtract	Structured data extraction from SEC/RNS filings with citation tracking and source traceability	MIT
Llama 4 Scout	10M token context window — can ingest entire annual reports, 5 years of filings simultaneously	Meta Community
FMP / Polygon / FRED APIs	Real-time financial data, historical statements, macro overlays — provide the multi-source data layer the reports lack	Commercial (low cost)

Where Primer's Genuine Moat Should Be

The podcast makes a compelling case for three capabilities that open-source tools cannot easily replicate:

1. Workflow Encoding — the "2,000 Modules" Problem
Smallwood describes 2,000 modular analytical tasks an analyst knows how to perform, from forensic accounting to peer comparison. The value is not in executing any single module but in knowing which module comes next. This agentic sequencing — deciding that after spotting aging receivables you should check the supplier 10-Ks — is genuinely hard to replicate with generic open-source tools. The reports, however, do not demonstrate this capability. They follow a linear template, not an adaptive workflow.

2. Agent Memory — Compounding Context Over Time
Primer's agent remembers every interaction, building a mental model of each covered company. Smallwood describes this as the "compounding effect" — the agent inherits the analyst's view and improves suggestions over time. This is a genuine switching cost and a defensible moat. But it is a platform moat, not a report quality moat. These static reports do not benefit from it.

3. Programmable Analyst Rules
Users can "pin" instructions to the agent — e.g., "always model retail on a pre-IFRS 16 basis." This customisation creates a personalised analytical engine that improves with use. Again, the static reports don't show this; they are one-size-fits-all outputs.

The constructive conclusion: Primer's static report output is replicable and, when compared against a multi-source data pipeline, is outperformed by open-source alternatives at negligible cost. The genuine product differentiation — workflow encoding, agent memory, and programmable rules — is compelling but is not visible in these reports. The product roadmap should prioritise making these interactive, compounding capabilities the primary value proposition, rather than competing on static report generation where the moat is thin.

Question 3

Loom Video

Five-minute video explaining Primer to a skeptical Head of Research, then addressing: why not ChatGPT, Claude, or AlphaSense?

🎥 Loom Video

Watch the video →

Replace the link above with your Loom URL after recording.

First 90 Seconds: The Pitch to a Skeptical Head of Research

Open with the problem, not the product.

"Your analysts spend 60% of their time on data extraction and model building — work that AI now does in minutes. The remaining 40% — judgment, conviction, behavioural overlay — is where your alpha comes from. But most AI tools optimise for the 60% and ignore the 40%. Primer is different: it's built by analysts who've sat in your seat, and it's designed to make the judgment work better, not just the grunt work faster."

The differentiator in one sentence.

"Primer doesn't just extract data — it remembers how you think about each company, learns your analytical preferences, and compounds that context over time. It's a co-pilot that gets smarter the more you use it."

Why Not Just Use ChatGPT?

No memory, no workflow, no auditability.

ChatGPT is a general-purpose tool. It doesn't remember your last session, can't enforce your analytical rules (e.g. "always model pre-IFRS 16"), and has no audit trail for where numbers came from. In a regulated environment with capital at stake, "I asked ChatGPT" is not a defensible process. Primer gives you a walled-garden agent that inherits your methodology.

Why Not Just Use Claude or Claude Code?

Powerful reasoning, but no financial domain architecture.

Claude is excellent at analysis — I used it to build this assessment. But it doesn't have structured financial data ingestion, can't pull live filings, doesn't maintain a coverage universe, and starts from zero every session. Primer has 100% retrieval accuracy from source documents and persistent analyst-specific context. Claude is a brilliant brain with no filing cabinet.

Why Not Just Use AlphaSense?

Search vs. synthesis — and the data cross-referencing gap.

AlphaSense is exceptional at finding information across documents. But the real analytical value isn't in finding — it's in cross-referencing. Can AlphaSense automatically pull the balance sheet from FMP, compare gross margin trends over 3 years, check whether management's net debt definition excludes lease liabilities, and flag that tangible equity is negative? That requires structured data pipelines feeding verified financial data into the analysis, not just document search. Primer does the thinking after the finding. The opportunity is to combine Primer's workflow intelligence with multi-source data verification — that's the product nobody else has built.

The Bigger Point: Multi-Model Verification

None of these tools — ChatGPT, Claude, or AlphaSense — verify their own output.

Every AI tool gives you a single model's answer. But GPT-5.4 and Claude Opus are trained on different data, with different biases and different failure modes. When you run the same question through both and they agree, your confidence is high. When they diverge, that divergence is the most valuable signal in the entire analysis. Primer is well-positioned to build this — you already have both OpenAI and Anthropic integrated. The step from "user selects one model" to "system runs both and flags divergence" is the highest-value product evolution available.

Question 4

AI Tool Use Note

Which tools were used, how they were used, where they helped, and where human judgment was still required.

Claude Code (Claude Opus 4.6) — Primary orchestrator Research orchestration, argument structuring, HTML generation, and final synthesis across all four questions. Chosen for long-context reasoning and ability to coordinate multi-step workflows. Used throughout.
Multi-Model Debate Engine (Sonnet + GPT-5.4 + DeepSeek V4 + Gemini 2.5 Pro) Two structured debates: (1) AI & investing thesis vs counter-thesis, (2) report quality evaluation against verified data. Each model argued independently before Gemini synthesised. This is the method the memo advocates — divergence between models is signal. Total cost: ~$0.22.
Groq Whisper API (whisper-large-v3) — Audio transcription Downloaded Alistair Smallwood's Zeus Capital podcast (63 min), split into 7 chunks under the 25MB API limit, transcribed all chunks. Full transcript: 60,539 characters. Used to extract Primer's own positioning and identify the counter-thesis for the memo.
Financial Modeling Prep (FMP) API — Verified financial data 3-year income statements, balance sheets, cash flows, ratios, and growth metrics for PETS.L. This provided the verified actuals (revenue, margins, goodwill, FCF, debt) used to cross-reference the four reports. The single most valuable data source — it revealed the 120bp gross margin compression and negative tangible equity that all reports missed.
DeepSeek V4 (open-source) — Defensibility test Generated a competing equity research report from the same data to test whether Primer's output is replicable. The open-source report outperformed all four Primer reports across 8 dimensions. Cost: $0.003. Time: 8 seconds. The improvement came from the data pipeline (multi-source), not the model.
Playwright (browser automation) — Product analysis Rendered Primer Studio's signup and app pages to understand the product architecture: invite-only access, Next.js on Render, S3+CloudFront reports, Cloudinary images. Mapped the full feature set from the UI shell.
Web Search (3 parallel queries per question) — Current data AI in equity research workflows (Jefferies rollout), hallucination statistics (Harvard/MIT), alpha compression literature (PineBridge/CFA), Pets at Home analyst coverage and FY26 results.

✓

Where AI Helped vs. Where It Didn't

AI excelled at breadth (processing audio, running parallel searches, cross-referencing data). Human judgment was required for thesis direction, editorial tone, and understanding organisational reality.

▼

AI helped most with: Research breadth and speed. Processing 63 minutes of audio, running 4 independent model perspectives, cross-referencing financial data across 4 sources, and identifying blind spots (Gemini's "Contradiction Engine" framing, the proprietary data insight). Without AI tools, this assessment would have taken 8-10 hours. With them, it took approximately 3 hours.

AI did not help with: The central thesis of Q1 (processing → validation) is a judgment call from experience, not a model output. The decision to challenge the "cover more stocks" thesis was mine — two of four models disagreed. The report comparison required understanding what a buy-side analyst would actually do with these reports, which is experiential knowledge. And the Loom talking points required understanding how fund managers think about tool adoption, which no model captured well.

Tools I use daily: Claude Code (primary development and research), Claude (analysis and writing), cursor (coding), ChatGPT (quick queries and second opinions), Whisper (transcription), various financial APIs (FMP, Polygon, FRED). I build multi-model debate pipelines and agentic workflows as part of my day job — the methods used in this assessment are how I work, not performance for the submission.

Appendix

Discussion Points: Product & Architecture

These are not feature requests or a build plan — they are points of discussion arising from the report analysis, podcast review, and competitive landscape research. Each represents a question I'd want to explore with the team: where does the product roadmap prioritise, what are the trade-offs, and which of these would deliver the highest marginal value to buy-side users?

Should reports cross-reference beyond the source document?

The single biggest improvement. Cross-referencing the pre-close statement against the balance sheet, cash flow, and external data sources would have caught the gross margin compression, negative tangible equity, and FCF story that all four reports missed.

▼

The problem: All four reports appear to analyse only the pre-close statement text. They extract accurately from that document but do not cross-reference against Companies House filings, prior annual reports, or financial data APIs. The result is reports that are precisely right about what the company chose to disclose and entirely silent about what it didn't.

The fix: Before generating any report, the agent should automatically pull the most recent balance sheet (goodwill, debt, tangible equity), cash flow statement (FCF, OpCF, CapEx), and 3-year income statement trends (margin trajectory). These are publicly available via APIs like FMP at negligible cost. Every report should include a mandatory "Balance Sheet & Cash Flow" section, even if the source document doesn't mention them — especially if it doesn't.

Impact: This alone would have caught the £960m goodwill / negative tangible equity risk, the 120bp gross margin compression, and the £147m FCF that supports the valuation case. These are the three most decision-relevant facts about Pets at Home, and none appeared in any report.

How far should automatic tri-statement analysis go?

No report connected the three financial statements. A competent analyst always does. This should be a hardcoded analytical step, not optional.

▼

The problem: The reports treat the income statement in isolation. But equity analysis fundamentally requires connecting statements: does the P&L decline flow through to cash? Is the balance sheet supporting or constraining the recovery? Are dividends covered by cash flow or funded by debt?

The fix: Build a mandatory "Tri-Statement Sanity Check" into every report. For example: (1) PBT declined 30% — did OpCF decline proportionally? (No: OpCF only declined 13%, signalling strong cash conversion.) (2) Dividend was cut 43% — is the new DPS covered by FCF? (Yes: FCF of £147m covers the ~£34m dividend 4.3x.) (3) Net debt includes £397m of lease liabilities — does the company definition of "c£20m net debt" match reality? (Only if you exclude leases.)

Impact: This would differentiate Primer from every competitor that just summarises the P&L. It's also where Smallwood's "2,000 modules" concept should shine — the agent deciding to check cash flow quality after spotting an earnings decline is exactly the kind of adaptive workflow that's hard to replicate with generic tools.

Should every report ship with valuation and consensus context?

None of the reports included EV/EBITDA, P/E, FCF yield, or analyst consensus data. A research report without valuation context is a news summary.

▼

The problem: The reports tell you what happened but not whether it matters for the investment decision. At 192p with a 17.8% FCF yield and EV/EBITDA of 6.05x, the market has already priced in significant pessimism. Without this context, an analyst can't determine whether the earnings miss creates a buying opportunity or confirms a value trap.

The fix: Every report should include a standardised valuation footer: current price, market cap, EV/EBITDA, P/E, FCF yield, dividend yield, and analyst consensus (number of analysts, Buy/Hold/Sell split, average target, range). This data is available from free and low-cost APIs. The agent should also flag when valuation metrics move to historical extremes — e.g., "FCF yield of 17.8% is the highest since FY19."

Where is the line between summarising management and challenging them?

All reports accepted the CMA outcome as "benign" and the retail recovery narrative at face value. The agent should be trained to identify where management framing diverges from financial reality.

▼

The problem: The "definition mismatch" excuse for the £21m retail consensus miss (£51m vs £30m) is the clearest example. Rather than investigating why retail underperformed so dramatically, the reports rationalised the discrepancy as a data quality issue. Similarly, accepting "no adverse impact from CMA" without independent analysis is restating PR, not research.

The fix: Build a "Red Flag" module that automatically: (1) compares management language across quarters for shifted narratives, (2) flags when actual results miss consensus by >10% and demands root cause analysis rather than definitional excuses, (3) cross-references management claims against independent data (e.g., CMA ruling text, competitor filings), and (4) explicitly marks which conclusions are management-sourced vs independently derived.

Impact: This is the "Contradiction Engine" concept from the memo. It's also the single most defensible product capability — a tool that makes analysts more skeptical is genuinely differentiated from tools that make them faster.

When should the agent calculate rather than estimate?

All reports estimated the DPS cut as a percentage range (-25-35%) and all were wrong (actual: -43.1%). The agent should derive DPS from the stated 50% payout ratio and forecast EPS, not guess.

▼

The problem: Management explicitly stated a rebase to a 50% payout ratio. Given underlying EPS of 14.8p, the implied DPS is 7.4p — exactly what was delivered. The agent should have calculated this rather than estimating a percentage range.

The fix: When management provides a payout ratio, the agent should: (1) calculate the implied DPS from forecast EPS, (2) compare this to the prior DPS to derive the implied cut, (3) cross-check whether FCF covers the new dividend, and (4) assess buyback implications for EPS accretion. This is arithmetic, not judgment — exactly the type of work an agent should do flawlessly.

Security observations: infrastructure, data segregation, and report delivery

12 findings across 4 categories from passive analysis only. Key concern: the data segregation claim from the podcast ("all user data is owned by the user, walled garden per user") is contradicted by the report delivery architecture.

▼

All findings below are from passive observation of URLs provided as part of this assessment, public DNS records, and HTTP response headers. No active scanning, exploitation, or penetration testing tools were used. This review is presented constructively — as a security-aware assessment of the product's public-facing architecture.

Infrastructure Map (from response headers)

Component	Technology	Evidence	Risk Level
Report hosting	AWS S3 + CloudFront	`server: AmazonS3`, `x-amz-cf-pop` headers	Medium
Marketing site	Framer	`server: Framer/e66ed00` (version exposed)	Low
Product app	Next.js on Render, behind Cloudflare	`x-powered-by: Next.js`, `x-render-origin-server: Render`	Medium
Image assets	Cloudinary (account: `dttjaxqso`)	Image URLs in report HTML	Low
Report template	MJML (email framework)	Microsoft Office conditional comments in HTML source	Info
Legal entity	KernelAI, 125 London Wall, EC2Y 5AS	Report footer	Info

Category 1: Report Authentication & Access Control

FINDING 1: Reports use security-by-obscurity (unguessable URLs, no token auth) The report URLs shared as part of this assessment are served from S3 via CloudFront without session tokens or API keys — access is controlled by URL obscurity rather than authentication. This is a reasonable approach for sharing specific reports (as Primer did with this assessment), but the question is whether all client reports use the same delivery mechanism. If so, any leaked or forwarded URL grants permanent, unlimited access to proprietary analysis.

Consideration: For institutional clients with compliance requirements, time-limited signed URLs or session-gated access would provide stronger assurance. This is a discussion point rather than a vulnerability — the current approach works for intentional sharing but may not satisfy enterprise security audits.

FINDING 2: URL structure is predictable and enumerable Report URLs follow a predictable pattern: /{TICKER}/filing_briefing/{DATE}_{TIME}.html. While guessing the exact timestamp requires brute-forcing, the ticker and date components are publicly knowable. An attacker who knows Primer covers PETSP.L and that FY26 results were released on 31 March 2026 has a small search space.

Mitigation observed: S3 bucket policy returns 403 for unknown paths, which limits enumeration. However, the predictable URL structure means that a single leaked URL reveals the naming convention for all reports.

Recommendation: Use random UUIDs in report URLs (e.g., /reports/a3f7c2d1-9b4e-...) rather than ticker/date patterns.

Category 2: Data Segregation vs. Podcast Claims

FINDING 3: Data segregation claim contradicted by report architecture In the Zeus Capital podcast, Smallwood stated: "All of the data inputted by the user is owned by the user... it's walled garden per user rather than per firm."

However, the report delivery architecture contradicts this claim:
• Reports are stored by ticker (/PETSP.L/), not by user or organisation
• No user-scoped paths, tokens, or identifiers appear in any report URL
• Report A (production) and Report B (testing/prod) have identical content-length (49,383 bytes), suggesting the same underlying data generates the same output regardless of user
• The walled-garden claim may apply to the interactive studio agent (where user-specific "Memories" and pinned rules would differentiate output), but it does not apply to the static report delivery layer, where reports appear to be generated per-ticker, not per-user

Risk: If a client believes their analytical customisations (pinned rules, agent memory) are reflected in the delivered report, but the report is actually generated from a shared, user-agnostic pipeline, this creates a mismatch between expectation and reality.

Recommendation: Either ensure reports incorporate user-specific context (making each user's PETSP.L report genuinely different) or clearly communicate that static reports are ticker-level outputs distinct from the personalised interactive experience.

FINDING 4: Test and production reports share the same S3 bucket and CloudFront distribution Reports at /PETSP.L/filing_briefing/ (production) and /testing/PETSP.L/filing_briefing/ (testing) are served from the same domain, bucket, and CDN. The three testing reports were all uploaded simultaneously (identical last-modified: Thu, 02 Apr 2026 16:23:10 GMT), confirming this is a test/evaluation pipeline sharing production infrastructure.

Risk: Commingling test and production data increases the risk of accidental exposure. A misconfigured bucket policy could expose test data (which may include debug information, internal notes, or early-stage analysis) to production users.

Recommendation: Separate S3 buckets and CloudFront distributions for test and production environments.

Category 3: Security Headers & Hardening

FINDING 5: No security headers on reports subdomain The reports.production.primerapp.com responses include zero security headers:
• No Content-Security-Policy — reports could load external scripts or be injected with malicious content
• No X-Frame-Options — reports can be iframe'd by any third-party site (clickjacking risk)
• No X-Content-Type-Options — MIME sniffing attacks possible
• No Strict-Transport-Security — HTTPS not enforced via HSTS
• No Referrer-Policy — report URLs may leak in referrer headers

FINDING 6: No security headers on studio application The studio.primerapp.com application similarly lacks: Content-Security-Policy, X-Frame-Options, X-XSS-Protection, Referrer-Policy, and Permissions-Policy. For a financial application handling proprietary data, this is below industry baseline. The marketing site (Framer) does include Strict-Transport-Security, but the product application does not.

FINDING 7: Full server stack disclosed in response headers An attacker can identify the complete technology stack from a single HTTP request:
• Reports: server: AmazonS3, x-amz-server-side-encryption: AES256
• Studio: x-powered-by: Next.js, x-render-origin-server: Render, server: cloudflare
• Marketing: server: Framer/e66ed00 (including build version)

This gives an attacker a complete map of technologies to target with known CVEs. Standard practice is to remove or generalise server headers.

Category 4: Application Architecture Observations

FINDING 8: Studio app renders full UI shell before authentication Fetching studio.primerapp.com returns the complete application navigation structure (Studio, Library, Templates, Models, Notes, Data, Routines, Coverage, Reports, Calendar, Inbox, Settings) and feature names (AutoYOLO, Memories, Sources) before any authentication check. While this is common in client-side rendered Next.js applications, it exposes the full feature set and UI architecture to unauthenticated users.

Note: This is how we mapped the complete product feature set without having an account.

FINDING 9: Reports generated using email template framework (MJML) The report HTML contains Microsoft Office conditional comments (<!--[if mso]>) characteristic of the MJML email template framework. This suggests reports may be dual-purpose: served both as web pages and via email delivery. Email-compatible HTML cannot support Content-Security-Policy headers, which may explain the missing security headers on the reports subdomain.

FINDING 10: Cloudinary account identifier exposed Reports reference images from Cloudinary account dttjaxqso. While Cloudinary has reasonable default security, the account identifier could be used to enumerate uploaded assets if resource list access is not explicitly disabled.

FINDING 11: S3 XML error responses not masked Requesting non-existent paths on the reports domain returns raw S3 XML error responses (<Error><Code>AccessDenied</Code>...<HostId>...</HostId></Error>) including internal request IDs and host identifiers. CloudFront should be configured to return custom error pages rather than proxying S3 error responses.

FINDING 12: No security.txt or vulnerability disclosure policy /.well-known/security.txt returns a 307 redirect (to auth), not a security contact page. For a financial services product, having a published vulnerability disclosure policy and security contact demonstrates maturity and is increasingly expected by institutional clients.

Summary & Severity Assessment

#	Finding	Severity	Effort to Fix
1	Security-by-obscurity on report URLs	Medium	Medium (signed URLs)
3	Data segregation claim vs reality	High	High (architecture change)
2	Predictable URL structure	Medium	Low (UUID paths)
4	Test/prod commingled	Medium	Low (separate buckets)
5-6	Missing security headers	Medium	Low (CloudFront/Cloudflare config)
7	Server stack disclosure	Medium	Low (header stripping)
8	UI shell pre-auth render	Low	Medium (SSR auth guard)
9-12	MJML, Cloudinary, S3 errors, security.txt	Low	Low

The two high-severity findings — unauthenticated report access and the gap between the data segregation claim and the observable architecture — are the ones most likely to surface during institutional client due diligence. Addressing these before scaling the buy-side customer base would be prudent.

Concept Product: MultiLens — AI Equity Research with Built-In Verification

A working prototype demonstrating the multi-lens approach: extraction verification, cross-statement analysis, contradiction detection, and market context. Built with real PETS.L data. Architecture debated across 4 AI models.

▼

The thesis: Primer's single-agent architecture is a strong starting point, but the real moat in AI equity research is not extraction accuracy — it's verification architecture. A multi-lens system where every conclusion is independently cross-checked creates a product that analysts can actually trust with capital at risk.

I built a working prototype (open MultiLens prototype →) using real Pets at Home data from FMP API, Investegate, and MarketScreener. It demonstrates four lenses:

Lens 1: Extraction Verification
Two models independently extract the same data. Where they agree: high confidence. Where they diverge: flag for human review. In the PETS.L analysis, 5/6 metrics matched perfectly. The DPS estimate diverged — and both models were wrong. This is exactly the kind of error the lens catches.

Lens 2: Cross-Statement Analysis
Algorithmic checks connecting P&L, Balance Sheet, and Cash Flow. This lens found the three most important facts about Pets at Home that no Primer report mentioned: negative tangible equity (-£8.6m), gross margin compression (120bps), and FCF of £147m (17.8% yield). Not AI magic — just connecting the financial statements.

Lens 3: Contradiction Detection
Red team agent that challenges management framing against financial reality. Flagged 5 contradictions including: "net debt c£20m" omitting £337m of lease liabilities, forward guidance credibility given the FY26 downgrade cycle, and unreconciled overhead savings.

Lens 4: Market Context
Every report ships with valuation multiples (EV/EBITDA, P/E, FCF yield), analyst consensus (11 analysts, Buy/Hold/Sell split), individual broker targets, and peer context. Without this, a research report is a news summary.

Architecture decision (from 4-model debate):

MVP = Lenses 1 + 2 + 4
The debate unanimously agreed these three lenses deliver the highest immediate value: data integrity (1+2) and decision-relevance (4). Lens 3 (Contradiction) is the highest-priority fast-follow. Lens 5 (Behavioral/positioning data) was explicitly rejected for the MVP — it muddies the fundamental analysis value proposition.

The moat is the Synthesis Engine, not any single lens
Gemini's synthesis identified the key insight: the moat is not a feature Primer can copy. It's the orchestration layer that knows how to run the lenses, compare outputs algorithmically, and generate actionable divergence flags. A self-improving flywheel where every contradiction becomes training data makes the system smarter over time.

UX is the primary risk
All four models flagged that presenting multi-lens analysis without overwhelming the analyst is the biggest product risk. The prototype uses a "summary-first" design: the top-level synthesis is a single paragraph. Key flags are 4 cards. The lenses are tabs you drill into only when you want the evidence. Complexity is hidden until requested.

Why this matters for Primer's roadmap: The multi-lens concept is not a competitor — it's a product evolution. Primer already has the domain expertise, the analyst workflows, and the agent memory. Adding cross-statement verification, contradiction detection, and market context to the existing platform would be the highest-ROI product investment. The interactive agent capabilities (Memories, Routines, programmable rules) become even more powerful when the underlying analysis is independently verified across multiple lenses.

Dimension	Primer	MultiLens
Model architecture	Single model selected by user (GPT-5.5 default)	2+ models run simultaneously; divergence = signal
Verification	None — trust the single model output	Every extraction dual-verified; every conclusion cross-checked
Balance sheet analysis	Not in reports (missed £960m goodwill on PETS)	Auto-pulled via FMP API for every analysis
Contradiction detection	Not implemented	Red team agent challenges management framing against data
Proactive monitoring	User-configured Triggers (reactive)	Radar: automated cron + materiality filter (proactive)
Data cost	Visible Alpha (~$50K/yr)	FMP API (~$30/mo) — 1,666x cheaper
Report authentication	None (public S3 URLs)	Signed URLs via Supabase Storage
Agent architecture	"Monolith" single agent (from their JS bundles)	Specialised stateless agents per lens

Candidate Assessment

Approach

Written Memo: AI & Fundamental Investing

Report Comparison: Pets at Home (PETS.L)

Loom Video

AI Tools Used

Discussion Points & Security Observations

MultiLens Studio — Interactive Product Demo →

Written Memo: AI and Fundamental Investing

Step 1: Podcast Transcription & Thesis Extraction

Step 2: Parallel Web Research

Step 3: Multi-Model Debate

Where AI Helped — and Where It Didn't

AI-Generated Research Report Comparison

Data Sources Used

Verification Method

Most Comprehensive

Best Editorial Judgment

Data Regurgitation

Zero Added Value

Pets at Home Group plc (PETS.L) — Earnings Briefing

Head-to-Head: Our Report vs Primer Reports

Why the Open-Source Report Is Better

Open-Source Tools That Could Replicate This

Where Primer's Genuine Moat Should Be

Loom Video

First 90 Seconds: The Pitch to a Skeptical Head of Research

Why Not Just Use ChatGPT?

Why Not Just Use Claude or Claude Code?

Why Not Just Use AlphaSense?

The Bigger Point: Multi-Model Verification

AI Tool Use Note

Discussion Points: Product & Architecture

Infrastructure Map (from response headers)

Category 1: Report Authentication & Access Control

Category 2: Data Segregation vs. Podcast Claims

Category 3: Security Headers & Hardening

Category 4: Application Architecture Observations

Summary & Severity Assessment

MultiLens Studio — Interactive Product Demo

Reverse Engineering Process

7 Interactive Views

Key Differentiators vs Primer